additional experimental result
A Additional Experimental Results
Robot action primitives are agnostic to the exact geometry of the underlying robot, provided the robot is a manipulator arm. As noted in the related works section, Dynamic Motion Primitives (DMP) are an alternative skill formulation that is common robotics literature. Each primitive ran 200 low-level actions with a path length of five high level actions, while the low-level path length was 500. With raw actions, each episode took 16.49 We run an ablation to measure how often RAPS uses each primitive.
7 Additional Experimental Results and Further Analysis
The descriptions of each model setup are provided in Section 8.2 . The reason is that different types of agents have distinct behavior patterns or feasibility constraints. Compared to single-stage training, the 4.0s NBA dataset to demonstrate the effect of different numbers of edge types and re-encoding gaps. More specifically, in the first case of Figure 7, for the player of the green team in the middle, the historical steps move forward quickly, while our model can successfully predict that the player will suddenly stop, since he is surrounded by many opponents and he is not carrying the ball. Such case is a very common situation in basketball games.
A Additional Experimental Results
Reward curves for TOP-RAD and RAD on pixel-based tasks from the DM Control Suite are shown in Figure 7. Figure 7: Results across 10 seeds for DM Control tasks. Each individual run was performed on a single GPU and lasted between 3 and 18 hours, depending on the task and GPU model. The procedures for updating the critics and the actor for TOP-TD3 are described in detail in Algorithm 2 and Algorithm 3. Algorithm 2: UpdateCritics In order to enable adaptation, we make use of an approach inspired by recent results in the model selection for contextual bandits literature. Bandit problems, the "arm" choices in the model selection setting are not stationary arms, but learning algorithms. The objective is to choose in an online manner, the best algorithm for the task at hand.The In figure 5, Ant-v2 we show this to be the case.